Background

The following is a preliminary data analysis of nitrate spikes in water after rainfall events at different locations in the Midwest.

Nitrate data comes from the US Geological Survey.

Precipitation data comes from NOAA’s Local Climatological Data.

Since there aren’t USGS monitoring locations in every “hot-spot” county from our previous analysis, we focused on finding locations that met the following criteria:

  1. Monitors the parameter “99133,” or nitrates in milligrams per Liter.
  2. Location is still active today
  3. Had at least one nitrate spike above the federal threshold of 10mg/L this year
  4. Its data goes as far back as possible

Part 1

First, we loaded the R libraries we will use, including dataRetrieval. This is a package that allows us to request USGS data from R Studio.

library(dataRetrieval)
library(tidyverse)
library(lubridate)
library(rvest)
library(leaflet)
library(leaflet.providers)
library(dygraphs)
library(sp)
library(rgeos)
library(xts)
library(htmlwidgets)
library(data.table)
library(knitr)
library(DT)

Then, we will run the following code bit to each Midwest state. It uses a dataRetrieval. function to request all the USGS monitoring locations at the given state and then filters them by keeping the ones that are still active and that have nitrate data. This will give us the site number, coordinates and how long each location has been active.

Note: North Dakota is not included below since it does not have any nitrate monitoring location.

IL_site <- readNWISdata(stateCd= "IL", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 12 obs of 5 var

IN_site <- readNWISdata(stateCd= "IN", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 10 obs of 5 var

IA_site <- readNWISdata(stateCd= "IA", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 11 obs of 5 var

KS_site <- readNWISdata(stateCd= "KS", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 4 obs of 5 var

MI_site <- readNWISdata(stateCd= "MI", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 3 obs of 5 var

MN_site <- readNWISdata(stateCd= "MN", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 0 obs of 5 var

MO_site <- readNWISdata(stateCd= "MO", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 3 obs of 5 var

NE_site <- readNWISdata(stateCd= "NE", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 0 obs of 5 var

OH_site <- readNWISdata(stateCd= "OH", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 2 obs of 5 var

SD_site <- readNWISdata(stateCd= "SD", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 3 obs of 5 var

WI_site <- readNWISdata(stateCd= "WI", parameterCd="99133",
                        service="site", seriesCatalogOutput=TRUE) %>% 
  filter(site_tp_cd == "ST") %>%
  filter(end_date == "2021-08-26") %>%
  filter(parm_cd == "99133") %>%
  distinct(site_no, .keep_all = TRUE) %>%
  select(site_no, station_nm, lat = dec_lat_va,
         long = dec_long_va, begin_date) # 0 obs of 5 var

After running the above code, we noticed that in addition to ND; MN, NE and WI do not have monitoring locations that are still active. Then we made the following function that does the following:

  1. Pulls all the site numbers from a given state and requests nitrate data from this year.

  2. It aggregates the data to a find the maximum nitrate reading, yearlyPeak.

  3. It creates a new column that detects if the maximum nitrate level in each location was above 10 mg/L or not.

  4. It removes sites that did not have a nitrate peak higher than the federal threshold this year.

  5. It arranges the remainder locations from oldest to newest and selects the top one.

This means the function will chose one location per state that has data that goes as far back as possible and that had a nitrate spike above federal guidelines.

state_function <- function(state){
  
  site <- state
  
  # nitrate levels from this year
  iv <- readNWISdata(siteNumbers = site$site_no, parameterCd = "99133",
                     startDate = "2021-01-01", endDate = "2021-08-18",
                     service = "iv")
  
  # now aggregate by yearly max 
  iv$year <- year(iv$dateTime)
  
  yearPeak <- iv %>% 
    group_by(site_no, year) %>%
    summarise(yearlyPeak = max(X_99133_00000, na.rm = T)) %>%
    filter(yearlyPeak <= 40) %>% # wonky filter 
    select(-year)
  
  # join these to "site", sort by date, pick oldest one above 10mg/L  
  site <- site %>% 
    right_join(yearPeak) %>%
    arrange(begin_date) %>%
    mutate(illegal = case_when(
      yearlyPeak >= 10 ~ "Yes",
      yearlyPeak < 10 ~ "No")) %>%
    filter(illegal == "Yes") %>%
    select(-illegal) %>%
    head(1) # this is our top location at this state
  
  return(site)
}

Now we can run the function to each Midwest state that has an active nitrate monitoring location:

IL <- state_function(IL_site)
IN <- state_function(IN_site)
IA <- state_function(IA_site)
KS <- state_function(KS_site)
MI <- state_function(MI_site)
MO <- state_function(MO_site)
OH <- state_function(OH_site)
SD <- state_function(SD_site)

IA, IL, IN, MI, and MO have at least one monitoring locations based on the criteria we used to filter the places. However, MI and MO only go back one year.

Hence, we decided to use IA, IL and IN as examples in this analysis.

# combine the 3 left 
usgs <- rbind(IA, IL, IN) # most recent date is 2015-03-20 from IN

usgs_simple <- usgs %>% select(name = station_nm,
                               lat,
                               long,
                               begin_date) 

usgs_simple$type <- "USGS"

rm(IA_site,IL_site,IN_site,KS_site,MI_site,MN_site,
   MO_site,NE_site,OH_site,SD_site,WI_site,
   KS, MI, MO, SD, IA, IL, IN, OH)

Part 2

NOAA has several precipitation data sets. For this analysis we used their Local Climatological Data.

In order to find the closest weather station to the three USGS locations we chose, we scraped the weather station’s coordinates (because they were not included in the data we downloaded.)

The first part was scraped, through a Chrome plugin, to get the links from each weather station in Iowa, Illinois and Indiana– the three states where we have our USGS locations. There is a total of 154 weather stations in those states.

After we compiled those links, we imported them into R to do the rest of the scraping from here.

# import csv I manually scraped
airport_links <- read_csv("Data/NOAA/airport_links_clean.csv")

# now make a list with all the links
airport_list <- as.list(airport_links$link)

Then we created another function that scrapes two tables for each station. One contains its coordinates and the second contains data about the start and end date of its data collection.

# make a function that will scrape both tables and join them, uses list index
link_function <- function(index){
  
  station <- read_html(airport_list[[index]])
  
  table <- station %>% html_table()
  
  first_t <- table[[1]]
  second_t <- table[[2]]
  colnames(second_t) <- colnames(first_t)
  combined <- rbind(first_t, second_t)
  
  return(combined)
  
}

Now we can run the function to each link using its index number on the list we imported.

After those tables are compiled into a single data frame, we transposed it and cleaned up its format.

# transpose 
all <- as.data.frame(t(all))

# fix header and row names 
colnames(all) <- all[1, ]

all <- all %>% filter(Name != "Name") 

rownames(all) <- NULL

all <- all %>% select(name = Name,
                      lat_long = `Latitude/Longitude`,
                      begin_date = `Start Date¹`,
                      end_date = `End Date¹`)

# clean lat_long columns 

all <- all %>% separate(lat_long, c("lat", "long"), ",")

all$lat <- gsub("°", "", all$lat)
all$long <- gsub("°", "", all$long)

Then we removed the stations that are no longer active and this is what we have left:

# then remove by end date, not active now 

all <- all %>% filter(end_date == "2021-08-23") %>%
  select(-end_date) # 145 obs of 4 var

all$type <- "NOAA"

rm(airport_links, airport_list)

Part 3

Once we had all active weather stations in IA, IL and IN, we paired that data to our three USGS monitoring locations to find their neighbors, or closest corresponding station, using their coordinates.

# join usgs_simple and all
locations <- rbind(usgs_simple, all)
locations$lat <- as.double(locations$lat)
locations$long <- as.double(locations$long)

Here is a map that plots all these:

# map these out
station_color <- colorFactor(c("#139bf0", "#e69f07"), domain=c("NOAA", "USGS"))

leaflet(locations) %>%
  addProviderTiles(providers$Stamen.Toner) %>%
  addCircles(~long, ~lat, popup=locations$name, weight = 3, radius=7000, 
             color=~station_color(type), stroke = F, fillOpacity = 0.8)

For some it is easy to visually see which weather station is closest to each USGS site, but just to be sure, we used the following bit to calculate each neighbor.

sp.locations <- locations
coordinates(sp.locations) <- ~long+lat

d <- gDistance(sp.locations, byid=T)
min.d <- apply(d, 1, function(x) order(x, decreasing=F)[2])

neighbors <- cbind(locations, locations[min.d,], apply(d, 1, function(x) sort(x, decreasing=F)[2]))

colnames(neighbors) <- c(colnames(locations),"neighbor", "lat2", "long2", "begin_date2", "type2", "apply")

neighbors <- filter(neighbors, type == "USGS") %>%
  select(-apply)

rm(d, sp.locations, min.d, usgs_simple)

Part 4

After requesting precipitation data from NOAA’s LCD website for the last 5 years (as far as nitrate monitoring goes on our 3 locations), we also request nitrate data for the same time frame within R Studio.

# request USGS from 2015-03-20 to 2021-08-01
usgs_data <- readNWISdata(siteNumbers = usgs$site_no, parameterCd = "99133",
                   startDate = "2015-03-20", endDate = "2021-08-18",
                   service = "iv")

# clean columns
usgs_data <- usgs_data %>% select(site_no,
                                  datetime = dateTime,
                                  nitrate_lv = X_99133_00000)

usgs_data$date <- as.Date(usgs_data$datetime) # 602,152 obs of 4 var

# separate data by each location 
usgs_IL <- usgs_data %>% filter(site_no == "03336890") # 172,535 obs of 4 var
usgs_IA <- usgs_data %>% filter(site_no == "05482300") # 218,930 obs of 4 var
usgs_IN <- usgs_data %>% filter(site_no == "05524500") # 210,687 obs of 4 var

rm(usgs_data)

Nitrate data is very granular, as the readings are taken every 15 minutes. So, we aggregated those to the maximum daily levels, daily_peak, to get an idea how high they can go at a given day after a precipitation event.

# aggregate to peak nitrate levels per day 

usgs_IL_peak <- usgs_IL %>% group_by(date) %>%
  summarise(daily_peak = max(nitrate_lv, na.rm = T)) # 1826 obs of 2 var

usgs_IA_peak <- usgs_IA %>% group_by(date) %>%
  summarise(daily_peak = max(nitrate_lv, na.rm = T)) # 2331 obs of 2 var

usgs_IN_peak <- usgs_IN %>% group_by(date) %>%
  summarise(daily_peak = max(nitrate_lv, na.rm = T)) # 2304 obs of 2 var

rm(usgs_IA, usgs_IL, usgs_IN)

NOAA’s data is hourly precipitation, so we aggregated this data as total daily precipitation for each station. This aggregation is done through another short function we made.

# now import the NOAA data I requested on their LCD website

storm_lake_IA <- fread("Data/NOAA/storm_lake.csv") # 166,905 obs of 124 var
rantoul_IL <- fread("Data/NOAA/rantoul.csv") # 167,763 obs of 124 var
jasper_IN <- fread("Data/NOAA/jasper.csv") # 166,952 obs of 124 

# make a function that will clean up NOAA data
# into the format we need 

precipitation_function <- function(airport){
  
  # remove some columns 
  airport <- airport %>% select(datetime = DATE,
                                      precip_hour = HourlyPrecipitation) 
  
  airport$date <- date(airport$datetime)
  
  # make NA into 0s
  airport <- airport %>% mutate(precip_hour = replace_na(precip_hour, 0))
  
  # aggregate to total daily precip
  airport <- airport %>% group_by(date) %>%
    summarise(total_precip = sum(precip_hour)) 
  
  return(airport)
}

# now run the function to each state

storm_lake_IA <- precipitation_function(storm_lake_IA) # 2331 obs of 2 var
rantoul_IL <- precipitation_function(rantoul_IL) # 2343 obs of 2 var
jasper_IN <- precipitation_function(jasper_IN) # 2335 obs of 2 var

Now that both our nitrate data and precipitation data are on the same formats, we made interactive graphics with the dyGraph package.

# Create the xts (format different than DF) necessary to use dygraph
nitrates_IA <- xts(x = usgs_IA_peak$daily_peak, order.by = usgs_IA_peak$date)
precipitation_IA <- xts(x = storm_lake_IA$total_precip, order.by = storm_lake_IA$date)
IA_xts <- merge(nitrates_IA, precipitation_IA, join = 'left')
rm(nitrates_IA, precipitation_IA, usgs_IA_peak, storm_lake_IA)

nitrates_IL <- xts(x = usgs_IL_peak$daily_peak, order.by = usgs_IL_peak$date)
precipitation_IL <- xts(x = rantoul_IL$total_precip, order.by = rantoul_IL$date)
IL_xts <- merge(nitrates_IL, precipitation_IL, join = 'left')
rm(nitrates_IL, precipitation_IL, usgs_IL_peak, rantoul_IL)

nitrates_IN <- xts(x = usgs_IN_peak$daily_peak, order.by = usgs_IN_peak$date)
precipitation_IN <- xts(x = jasper_IN$total_precip, order.by = jasper_IN$date)
IN_xts <- merge(nitrates_IN, precipitation_IN, join = 'left')
rm(nitrates_IN, precipitation_IN, usgs_IN_peak, jasper_IN)

# plot
IA <- dygraph(IA_xts, main = "Nitrates and precipitation in IA location, 2015-2021") %>%
  dyAxis("y", label = "Daily Peak Nitrates (mg/L)") %>%
  dyAxis("y2", label = "Total Daily Precipitation (in)", independentTicks = TRUE) %>%
  dySeries("precipitation_IA", axis = 'y2') %>%
  dyRangeSelector(height = 20) %>%
  dyHighlight(highlightCircleSize = 5) %>%
  dyShading(from = 10, axis = "y",to = 50, color = "#FFE6E6") %>%
  dyOptions(colors = c("#a97de3", "#6fc978"), drawGrid = F) %>%
  dyLegend(labelsSeparateLines = T)

IL <- dygraph(IL_xts, main = "Nitrates and precipitation in IL location, 2015-2021") %>%
  dyAxis("y", label = "Daily Peak Nitrates (mg/L)") %>%
  dyAxis("y2", label = "Total Daily Precipitation (in)", independentTicks = TRUE) %>%
  dySeries("precipitation_IL", axis = 'y2') %>%
  dyRangeSelector(height = 20) %>%
  dyHighlight(highlightCircleSize = 5) %>%
  dyShading(from = 10, axis = "y",to = 50, color = "#FFE6E6") %>%
  dyOptions(colors = c("#a97de3", "#6fc978"), drawGrid = F) %>%
  dyLegend(labelsSeparateLines = T)

IN <- dygraph(IN_xts, main = "Nitrates and precipitation in IN location, 2015-2021") %>%
  dyAxis("y", label = "Daily Peak Nitrates (mg/L)") %>%
  dyAxis("y2", label = "Total Daily Precipitation (in)", independentTicks = TRUE) %>%
  dySeries("precipitation_IN", axis = 'y2') %>%
  dyRangeSelector(height = 20) %>%
  dyHighlight(highlightCircleSize = 5) %>%
  dyShading(from = 10, axis = "y",to = 50, color = "#FFE6E6") %>%
  dyOptions(colors = c("#a97de3", "#6fc978"), drawGrid = F) %>%
  dyLegend(labelsSeparateLines = T)

Viz

You can use your mouse to select periods of time or use the slider below each graph to zoom in to specific days/events.

IA
IL
IN

Stats

In order to quantify the nitrate spikes and how those coincide with precipitation, we had to define the events through code. But first, we downloaded USGS nitrate data for our site in IL. We then made a function to filter by season, only keeping the spring months, defined as April through June.

We decided to analyze the spring data since that is when most farmers apply fertilizer and when some climate assessments, like the IL climate assessment, predict precipitation could keep increasing.

After filtering the spring data, the function the aggregates that data into daily nitrate peaks, as earlier in the analysis.

# request IL data for April-June 2015-2021
# IL: 03336890
spring_IL <- readNWISdata(siteNumber = "03336890", parameterCd = "99133",
                        startDate = "2015-04-01", endDate = "2021-06-29",
                        service = "iv") 

spring_function <- function(spring_state){
  
  # filter by month 
  spring_state$month <- month(spring_state$dateTime)
  
  spring_state <- spring_state %>% filter(month %in% c(4:6))
  
  # clean columns
  spring_state <- spring_state %>% select(site_no,
                                    datetime = dateTime,
                                    nitrate_lv = X_99133_00000)
  
  spring_state$date <- as.Date(spring_state$datetime)
  
  # aggregate by daily peak
  spring_state <- spring_state %>% group_by(date) %>%
    summarise(daily_peak = max(nitrate_lv, na.rm = T)) 
  
  return(spring_state)
  
}

# apply function to IL
spring_IL <- spring_function(spring_IL) # 610 obs

After the function is applied to the IL site data, we imported the same precipitation data from NOAA’s closest weather station. This data was also aggregated to total daily precipitation using one of our previous functions.

Then we also filtered the precipitation data to match the spring months we have from the nitrate data. Our combined table is spikes_IL.

# import noaa for IL
spring_rantoul_IL <- fread("Data/NOAA/rantoul.csv") # 167,763 obs of 124 var

# apply cleaning function, aggregates total daily precipitation 
spring_rantoul_IL <- precipitation_function(spring_rantoul_IL) # 2343 obs of 2 var

# filter to same time range as USGS, by month 
spring_rantoul_IL$month <- month(spring_rantoul_IL$date)

spring_rantoul_IL <- spring_rantoul_IL %>%
  filter(month %in% c(4:6)) %>%
  select(date, total_precip) # 637 obs

# combine them
spikes_IL <- left_join(spring_IL, spring_rantoul_IL) %>% 
  mutate(total_precip = replace_na(total_precip, 0)) # 610 obs

rm(spring_IL, spring_rantoul_IL, IA, IL, IN, IA_xts, IL_xts, IN_xts)

Now that we have nitrate and precipitation data for the last 6 springs, we have to define nitrate spikes. This event will be true when X day has a nitrate level above 10mg/L, but that the day before the levels were below 10mg/L.

We started by using a lag function to have the previous days values in a new column and then detect a nitrate spike based on the criteria we defined.

spikes_IL <- spikes_IL %>% 
  mutate(nitrates_24b = lag(daily_peak, 1)) %>% 
  mutate(nitrate_spike = case_when(
  (daily_peak >= 10 & nitrates_24b < 10) ~ TRUE
))

Then, we wanted to see how much it rained the day of the spike, plus the day before, in case a rain event went overnight.

# we are also using a lag function to aggregate those two days of rain into "pre_spike_rain"
spikes_IL <- spikes_IL %>% 
  mutate(precipitation_24b = lag(total_precip, 1)) %>%
  mutate(pre_spike_rain = total_precip + precipitation_24b)

# filter for TRUE spikes 
spikes_IL <- spikes_IL %>% filter(nitrate_spike == T) %>%
  select(date, nitrate_spike, daily_peak, pre_spike_rain) # 39 obs

In the USGS site in IL, we found 39 spike events in the last 6 springs. We then created a new column that tells us if there was any rain from that day and the day before.

# and now filter for any spikes that had any amount of rain (just not 0)
spikes_IL <- spikes_IL %>% mutate(rained = case_when(
  pre_spike_rain != 0 ~ "Yes",
  pre_spike_rain == 0 ~ "No"
))

spikes_IL$rained <- as.factor(spikes_IL$rained)

rainy_spikes_IL <- spikes_IL %>% filter(rained == "Yes") # 34 obs

Filtering by those, we found that 34 out of those 39 spikes, or around 87%, had any measure of rain.

If we define heavy rain events as above 2 inches of precipitation, we can split those 34 events into two categories: above or below 2 inches.

above_2_IL <- rainy_spikes_IL %>% filter(pre_spike_rain >= 2) # 10 obs

below_2_IL <- rainy_spikes_IL %>% filter(pre_spike_rain < 2) # 24 obs

From those 34 rainy spikes, 10 had 2 inches or more inches of precipitation, and 24 had less than 2 inches. You can explore those in more detail on the tables below.

Above 2 inches:

Below 2 inches:

On average, spikes with heavy rain led to a higher nitrate spike than those with less than 2 inches of precipitation. Although, those events are less common.

Here are some stats about both type of events:

summary(above_2_IL)
##       date            nitrate_spike    daily_peak    pre_spike_rain  rained  
##  Min.   :2015-05-31   Mode:logical   Min.   :10.20   Min.   :2.040   No : 0  
##  1st Qu.:2016-09-08   TRUE:10        1st Qu.:11.35   1st Qu.:2.538   Yes:10  
##  Median :2019-05-30                  Median :13.85   Median :3.025           
##  Mean   :2018-09-22                  Mean   :14.17   Mean   :3.691           
##  3rd Qu.:2020-06-16                  3rd Qu.:16.90   3rd Qu.:4.763           
##  Max.   :2021-06-25                  Max.   :19.80   Max.   :7.120
summary(below_2_IL)
##       date            nitrate_spike    daily_peak    pre_spike_rain   rained  
##  Min.   :2015-05-10   Mode:logical   Min.   :10.00   Min.   :0.0300   No : 0  
##  1st Qu.:2016-04-25   TRUE:24        1st Qu.:10.25   1st Qu.:0.2050   Yes:24  
##  Median :2018-04-19                  Median :10.60   Median :0.5600           
##  Mean   :2018-01-19                  Mean   :11.20   Mean   :0.7342           
##  3rd Qu.:2019-09-05                  3rd Qu.:11.62   3rd Qu.:1.2175           
##  Max.   :2021-06-19                  Max.   :15.10   Max.   :1.7400

We also made a pivot table to see the average spike height per year and the number of spikes in the last 6 springs.

pivot_IL <- rainy_spikes_IL %>% mutate(year = year(date)) %>%
  group_by(year) %>%
  summarise(average_spike = round(mean(daily_peak), 1))

pivot2_IL <- rainy_spikes_IL %>% mutate(year = year(date)) %>% group_by(year) %>% tally()

pivot_IL <- left_join(pivot_IL, pivot2_IL)

rm(pivot2_IL)

Watershed

We understand that one of the limitations to this analysis is the complex hydrology behind these watersheds. As a reference, we used used the USGS applications StreamStats. However, we noted that the marker for the monitoring location in Champaign, IL (03336890) has a coordinate slightly different from the latitude and longitude attached to the site’s data.

Using the USGS application to delineate the drainage basin on those slightly different coordinates produces a different result, which either leaves the airport weather station from Rantoul, IL, in or out of the watershed.

This is the lat and long next to the marker:

This is the lat and long from the site’s data:

And these are the two different resulting watersheds: